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1.  Pentium  III  Processor 


1.1  Introduction  to  the  Pentium  III  Processor. 

The  Pentium  III  processor  is  the  newest,  and  probably  the  last  processor  in  the  IA-32  architecture  design  by 
Intel.  On  its  debut  in  late  February  1999,  it  was  under  controversy.  Not  for  it’s  enhancements  over  its 
predecessor,  The  Pentium  II  processor,  but  for  its  decision  to  insert  the  ID  number  into  the  chip.  This  is 
supposed  to  help  the  security  of  e-commerce,  but  it  can  be  used  to  track  people  on  the  internet  for 
marketing  or  other,  malicious  intentions.  Never  the  less,  there  are  quite  a  few  important  changes  in  the  new 
design. 

1.2  Design  of  the  Pentium  III  Processor. 

The  Pentium  III  processor  has  advances  not  only  in  clock  speed,  but  also  in  new  designs  over  its 
predecessors.  It  offers  a  450MHz  to  733MHz  systems  which  supports  either  a  1 00MHz  system  bus  or  a 
133  MHz  system  bus  (The  133MHz  system  bus  is  only  available  on  the  733MHz,  667MHz,  600MHz,  and 
the  533MHz  systems).  The  system  bus  frequency  is  selected  by  the  BSELf  1 :0]  signals.  It  is  determined  by 
the  processor  and  the  frequency  synthesizer  (see  table  1). 


BSEL1 

BSELO 

System  Bus  Freq. 

0 

0 

66  MHz  (unsupported) 

0 

1 

100  MHz 

1 

0 

reserved 

1 

1 

133  MHz 

Table  1 


It  has  a  16kB  non-blocking  data  cache  and  a  16kB  non-blocking  instruction  cache  on  its  level  1  cache 
(making  up  32kB  of  LI  cache).  It  uses  a  Dual  Independent  Bus  (DIB)  which  allows  the  system  bus  to  be 
freed  up  of  any  level  2  cache  traffic  by  putting  the  level  2  cache  on  its  own  dedicated,  high  speed  bus. 

There  is  two  types  of  level  2  cache  systems  available:  the  Discrete  Cache  and  the  Advanced  Transfer 
Cache.  The  Discrete  Cache  uses  commercially  available  parts.  It  is  composed  of  an  external  TagRAM  and 
a  burst  pipeline  synchronous  static  RAM.  Its  size  is  5 12kB  (see  figure  la.).  The  Advanced  Transfer  Cache 
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does  not  use  commercially  available  parts  (which  means  it  is  much  more  expensive)  and  it  resides  on  the 
processor.  Its  size  is  256kB  (see  figure  lb). 


Figure  1 


The  Pentium  III  processor  also  has  dynamic  execution  micro  architecture,  which  is  a  combination  of 
multiple  branch  predictions,  data  flow  analysis,  and  speculative  execution.  The  multiple  branch  prediction 
is  used  to  predict  program  execution  through  multiple  branches.  The  data  flow  analysis  creates  an 
optimized  ordered  schedule  of  instructions  by  analyzing  data  dependencies  between  instructions.  The 
speculative  execution  design  makes  sure  the  processors  superscalar  execution  units  remain  busy  by 
speculating  instruction  execution  based  on  the  optimized  schedule. 

Along  with  upgrading  the  design  areas  for  data  and  instruction  flow/execution,  it  also  increased  the  number 
of  instructions.  The  Internet  Streaming  SIMD  Extensions  are  instructions  added  to  enhance  video,  sound 
and  3-D  rendering;  common  tasks  of  internet  surfing  as  well  as  other  areas  of  computing.  With  the  MMX 
technology  and  previous  SIMD  instructions  available  on  past  IA-32  processors,  the  Pentium  III  processor 
hopes  to  comer  the  market  with  these  new  instructions  by  having  software  companies  write  code  that 
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utilizes  them.  The  70  new  Streaming  SIMD  Extension  instructions  include  floating  point  single 
instructions  and  multiple  data  (SIMD)  instructions. 

Other  features  of  the  Pentium  III  processor  include  a  pipelined  floating  point  unit  for  supporting  80-bit 
formats  as  well  as  the  IEEE  standard  32-bit  and  64-bit  formats,  memory  expansion  of  up  to  4GB  of 
addressable  cacheable  memory  space,  and  system  memory  expandable  up  to  64GB  of  physical  memory. 


2.  The  IA-64  (Merced)  Processor 

2.1  Introduction  to  the  IA-64  Processor 

The  goals  that  Intel  had  put  forth  in  designing  the  IA-64  (Merced)  processor  was  to  design  an  architecture 
that  could  lead  the  industry  in  performance,  be  able  to  expand  the  chip  over  the  next  few  decades,  and 
maintain  full  hardware  compatibility  with  the  IA-32.  They  decided  to  abandon  their  old  architecture  for 
their  high-end  processors.  The  new  processor  takes  a  few  pages  from  CISC,  RISC,  and  VLIW.  The  first 
processor  from  the  IA-64  (Merced)  family  is  code-named  Itanium  and  is  due  to  be  released  in  the  second  or 
third  quarter  of  2000. 

2.2  Design  of  the  IA-64  (Merced)  Processor 

The  Merced  processor  is  64-bit  memory  accessible  chip.  This  makes  it  more  apt  to  meet  the  needs  of  data 
warehousing  companies  and  e-businesses,  common  users  of  workstations  and  servers,  then  its  predecessors. 
Many  of  the  innovative  design  features  in  the  Merced  processor  are  made  to  improve  instruction  level 
parallelism  through  speculation,  prediction,  larger  register  files,  and  an  advanced  branch  architecture. 
Speculation  allows  for  preloading  data,  even  ahead  of  branches  or  possible  conflicting  stores,  so  that  the 
processor  can  grab  data  as  needed  rather  than  loading  data  from  memory  when  need.  Parallelism  is  done 
through  the  software  at  compilation.  The  compiler  will  analyze  the  code  and  optimize  the  structuring  of  the 
machine  code  before  the  processor  executes  it.  The  advanced  branch  architecture  lets  the  compiler  remove 
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any  unneeded  branches  through  new  instruction  formats.  When  branching  is  necessary,  it  uses  a  branch 
register  to  hold  the  target  address  for  indirect  branches.  For  control  loops  and  modulo  schedule  loops,  it 
uses  a  loop-closing  branch,  which  provides  perfect  predictions. 


The  instruction  formats  are  designed  for  two  classes  of  code:  32-bit  code  written  for  older  IA-32  processors 
and  64-bit  code  written  specifically  for  the  Merced  processor.  Within  the  IA-64  system  environment,  the 
processor  can  execute  code  from  either  type  of  instruction  set  class  or  a  combination  between  the  two.  This 
is  done  be  adding  three  special  instructions  and  an  interrupt  in  to  the  instruction  formats  (See  figure  2). 


jmpe 

br.ia 

1 ' " '  $ 

IA-32 

Instruction 

Set 

rfi 

f.  ~ 

Interrupt  or  exception  ^ 

Figure  2 


The  jmpe  instruction  is  a  32-bit  instruction,  which  jumps  to  an  IA-64  instruction  target  and  changes  the 
instruction  set  the  IA-64  format.  The  br.ia  instruction  is  a  64-bit  instruction  that  branches  to  an  IA-32 
target  instruction  and  changes  the  instruction  set  to  IA-32  format.  Interrupts  transition  the  processor  to  the 
IA-64  instruction  set  for  handling  the  interrupt  requests.  The  rfi  instruction  is  an  IA-64  instruction  which 
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changes  the  instruction  set  back  to  an  IA-32  or  IA-64  format  based  on  where  it  was  before  the  interrupt,  rfi 
stands  for  “return  from  interrupt”. 

An  important  feature  of  the  way  the  IA-64  instructions  are  handled  is  in  the  Instruction  Bundle  Format. 
Similar  to  the  VLIW  format,  the  Instruction  Bundle  Format  packs  3  instructions  into  a  bundle  to  be 
processed  at  one  time  (see  figure  3).  The  processor  splits  up  the  instruction  bundle  and  each  instruction  is 


127  87  86  46  45  5  4  0 

Instruction  slot  2 

Instruction  slot  1 

Instruction  slot  0 

Template 

41  bits  41  bits  41  bits  5  bits 


Instruction  Bundle  Format 


Figure  3 


processed  concurrently  to  enhance  the  parallelism  in  the  Merced  chip.  The  instruction  bundle  is  read  in 
little  endean  format. 


Memory  of  the  Merced  processor  is  accessed  only  through  load,  store,  and  semiphore  instructions  like  a 
typical  RISC  style  architecture.  It  is  byte  addressable  and  accessed  with  64-bit  pointers  only.  32-bit 
pointers  from  IA-32  coding  styles  will  have  to  be  changes  into  a  64-bit  format.  Byte  ordering  of  data  can 
be  in  either  big  endian  or  little  endian  format.  In  the  User  Mask  controls,  the  UM.br  bit  determines  whether 
little  endian  or  big  endian  format  is  used  to  store  the  data. 
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3.  Differences  Between  the  Pentium  III  and  the  IA-64  Processors 

Besides  the  obvious  64-bit  memory  accessibility  the  IA-64  has  over  the  32-bit  memory  accessibility  the 
Pentium  III  has  (as  shown  in  each  of  their  processor  family  names)  there  are  a  number  of  other  differences 
between  the  two  processors.  One  of  the  most  visible  differences  is  its  instruction  set.  The  Pentium  III 
processor  has  a  Complex  Instruction  Set  Computer  (CISC)  format:  other  commands  besides  load  and  store 
can  access  memory  and  it  has  a  variable  length  instruction  format.  The  IA-64  (Merced)  processor  has  an 
Explicitly  Parallel  Instruction  Computer  (EPIC)  format.  It  is  a  load/store  architecture  where  only  load  and 
store  commands  can  access  the  main  memory.  It  also  has  shorter,  fixed  instructions  then  the  Pentium  III. 
Similar  to  the  Very  Long  Instruction  Word  (VLIW)  format,  the  instructions  of  the  EPIC  are  bundled  in 
three.  This  makes  it  much  more  efficient  then  the  Pentium  III  in  executing  instructions  in  parallel.  The 
EPIC  format  has  been  shown  to  execute  6  instructions  in  one  clock  where  the  Pentium  III  executes  1 .5  to  2 
instructions  per  clock.  The  Merced  processor  still  uses  the  IA-32  instruction  set,  but  to  benefit  from  the  full 
parallelism  that  is  capable  with  the  chip,  the  code  must  be  in  IA-64  format.  Another  important  part  of 
optimizing  parallel  execution  is  the  way  the  CPU  handles  decision  points.  The  IA-64  is  more  compiler 
dependant  on  decision  points  then  the  Pentium  III.  It  achieves  perfect  prediction  for  control  loops  and 
modulo  schedule  loops  as  well  as  better  prediction  in  indirect  branching.  The  compiler  figures  out  what  is 
to  be  executed  during  a  branch  before  the  hardware  sees  it.  This  is  accomplished  by  providing  special 
branch  instructions  in  the  Merced’s  instruction  set. 


4.  What  Do  These  Differences  Mean  to  the  Consumer? 

To  the  IS  manager  buying  an  IA-64  (Merced)  networking  computer,  the  differences  are  significant  over  the 
Pentium  III.  The  memory  accessibility  is  much  larger  (64-bits  versus  32-bit)  which  give  data  warehouses 
and  e-businesses  the  ability  to  store  information  in  cache  well  above  the  4GB  limit.  The  system  bus  speed 
on  the  IA-64  is  predicted  to  achieve  greater  speed  then  the  Pentium  III  (200MHz  or  more).  Faster 
processing  of  instructions  will  be  accomplished  through  advanced  prediction,  speculation,  and  scheduling 
of  instructions.  This  will  create  an  optimal  parallel  execution  program.  Support  of  both  little  endian  and 
big  endian  data  storage  will  decrease  program  code  sizes  needed  to  convert  data.  It  will  also  decrease  the 
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complexity  in  communication  between  multiple  platform  machines.  Internet  speed  will  increase  due  to 
special  instructions  that  allow  general  register  concatenation  of  eight  8-bit,  four  16-bit,  or  two  32-bit 
elements;  all  running  in  parallel  and  independent  of  the  others.  This  will  increase  the  speed  of  multimedia 
data  which,  in-turn,  will  increase  the  speed  of  processing  internet  data. 

Another  important  issue  the  IS  manager  must  contend  with  is  the  amount  of  software  and  hardware 
compatible  with  the  new  Merced  processor.  All  high-end  computer  system  vendors  have  committed  them 
selves  to  supplying  computers  with  the  Merced  processor  (except  for  Sun  Microsystems).  Some  venders 
have  decided  to  scrap  their  other  systems  and  fully  transition  their  computers  to  the  Merced  processor  (this 
includes  HP  and  SGI  to  name  a  few).  Many  different  operating  systems  will  also  be  able  to  support  the 
Merced  processor.  Companies  committed  to  this  support  include  Microsoft,  Novel,  Linux,  and  5  other 
UNIX  operating  system  companies.  Server  and  workstation  application  companies  (like  Oracle,  IBM, 
Microsoft,  etc.)  have  also  committed  to  the  support  of  applications  that  are  compliant  with  the  Merced 
processor. 

For  the  general  PC  user,  internet  surfing  and  program  execution  time  will  be  increased,  provided  the 
software  companies  use  the  IA-64  instruction  set  when  redeveloping  their  programs.  Unfortunately,  the 
Merced  processor  will  be  far  too  expensive  for  the  computer  hobbyist  using  the  machine  to  surf  the  internet 
or  to  run  minor  programs.  At  first,  the  majority  of  the  programs  designed  to  utilize  the  IA-64  architecture 
will  be  for  large  projects  and  will  cost  lots  of  money.  In  a  year  or  so  after  the  Merced  chip  is  introduced, 
software  companies  will  develop  IA-64  code  for  the  common  home  user.  The  typical  PC  at  home  will  not 
be  able  to  take  advantage  of  the  new  code,  thus  not  be  able  to  benefit  from  the  increased  performance 
possible  in  the  Merced  processor. 


5.  What  About  Software  Companies? 

As  stated  above,  many  software  companies  have  committed  to  producing  software  applications  and 
operating  systems  to  handle  the  IA-64  (Merced)  processor.  However,  to  be  able  to  take  advantage  of  the 


9 


full  power  of  the  processor,  companies  will  have  to  do  some  major  re-writing  of  code.  The  IA-32 
instruction  set  will  still  be  available  for  the  software  developer,  but  much  of  the  new  I A-64  instruction  set  is 
designed  to  take  advantage  of  the  processor.  32-bit  pointers  that  point  to  memory  addresses  will  have  to  be 
converted  to  64-bit  format.  Added  instructions  (like  jmpe,  br.ia,  rft)  will  have  to  be  added  to  go  between 
IA-32  and  IA-64  instruction  sets.  Scheduling  will  have  to  be  done  at  compilation  time  rather  than  letting 
the  hardware  run  microinstructions  to  schedule.  Branch  prediction  will  also  be  handle  by  the  compiler  with 
new  options.  To  optimize  parallelism,  compilers  must  be  able  to  bundle  instructions  carefully  so  not  to 
reduce  the  scheduling. 

Software  companies  will  put  their  development  of  Merced  processor  code  towards  the  high  end  user  like 
workstations  and  data  warehouses.  Typical  PC  users  will  not  see  IA-64  code  for  a  while,  since  the  market 
is  not  sure  where  the  Merced  chip  will  be  in  the  future. 


6.  Conclusion 

Intel  has  designed  two  new  processors:  the  Pentium  III  and  the  IA-64  (Merced)  processor.  Both  are 
expected  to  co-exist  for  a  while;  the  Pentium  III  processor  will  be  for  the  average  home  user  and  the 
Merced  processor  will  be  for  the  workstation  or  large  data  base  user.  The  Pentium  III  uses  advanced 
features  over  its  predecessors,  but  still  maintaining  CISC  architecture.  The  Merced  processor  increases 
hardware  advancements  over  the  Pentium  III  as  well  as  allowing  software  to  handle  many  of  the  tasks 
hardware  used  to.  The  Merced  processor  is  of  the  EPIC  (Explicitly  Parallel  Instruction  Computing)  format, 
which  combines  CISC,  RISC,  and  VLIW  architecture.  It  is  Intel’s  intention  that  the  new  design  will 
become  a  standard  for  computers  in  the  future.  But,  for  this  to  happen,  the  Merced  chip  must  be  proven  to 
out  perform  other  64-bit  systems.  Software  companies  will  have  to  re-write  code  to  take  advantage  of  the 
Merced  processor.  No  longer  will  compatibility  between  processors  be  available.  Companies  will  have  to 
develop  multiple  program  bases  and  tools  to  be  able  to  provide  programs  for  different  processors.  This  will 
not  happen  unless  the  consumer  accepts  the  Merced  processor. 
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